“Our intelligence is what makes us human, and AI is an extension of that quality.” — Yann LeCun, renowned computer scientist at the New York University, Chief AI Scientist at Facebook and 2018 ACM Turing Award Winner
Prof Yann LeCun, renowned AI scientist and winner of the 2018 ACM Turing Award, delivered a thought-provoking public lecture at the ACM-India Annual Event on the rapidly evolving and entwined domains of Artificial Intelligence and Self-Supervised Learning. In his opening remarks, he mentioned that he would be talking about a number of different things, including “a little bit of history, followed by the basic things about deep learning”. He would then address questions about the future of AI and discuss some proposals for how we can make progress towards more intelligent machines.
This article is based on the thoughts shared by Prof LeCun in his public lecture and is a journey through the past, present, and future of AI. While the great potential of AI has also drawn its fair share of fear, we will focus our attention on the technology itself, and its promise when used as intended.
The Interdisciplinary History
Prof LeCun explained that the roots of Deep Learning do not completely lie in computer science, but more what used to be called cybernetics at that time and now is more part of engineering in general. It goes back to the 1940s when people had the idea of using the brain as an inspiration for the initiation of the earliest works in this domain. It became clear very quickly that the ability to adapt and learn is a crucial characteristic of the biological systems. Simultaneously, a lot of work in neuroscience demonstrated that adaptation in the brain works by modifying connections between neurons.
“This led to the invention of the Perceptron in the 1950s. It was sort of one of the first learning machines capable of running simple tasks, which was quite impressive for that time. It also had a huge influence in the sense that it kind of laid the groundwork for what became a field known as statistical pattern recognition and later changed into the standard model of machine learning,” he expressed.
Perceptron – The Way Towards Supervised Learning
The perceptron was a physical machine (electronic system) and not a program on the computer. It was cumbersome, and soon people realized that it would be easier to simulate this on a computer. It was this incident that laid the foundation of supervised learning – you can train a machine from examples instead of programming it explicitly. Let’s say you have a system, which you can think of as a function whose input-output relationship is defined by parameters symbolized by the knobs. Now, you want to train it to distinguish the images of cars from those of airplanes. When you show a car and the system says it’s an airplane, you adjust the knobs so that the output gets closer to the one you want. This process works really well for situations where we have lots of data, such as speech, image and face recognition, finding objects in an image, generating captions for pictures, or translating content from one language to another. Supervised learning forms the basis of a host of new applications popping up every day.
“A little more abstractly, the standard model of pattern recognition that came out of the Perceptron is the fact that you process the input through an extractor that is going to extract relevant characteristics of this input. This extractor, in the classical model, is hand-engineered. Then you have a classifier on top of it, which has those adjustable parameters and is the only part of the system that is trainable. You train it using a training set, which consists of pairs of inputs and desirable outputs. You measure the performance of the machine by computing the difference between the output you get and the desired output,” he elaborated further.
What’s The Role of Deep Learning In All This?
It is a very simple and powerful idea that instead of separating the extractor from the classifier and only training the classifier, we attempt to train the entire system end-to-end. Basically, we are going to build and optimize this system as a cascade of parameterized modules, each of which would have trainable parameters. The result of this is that it would be able to learn its own extractor, and there would be no need to hand-design this component. But, it requires more training samples than the previous scenario.
“Now, if you are a theorist, you might ask the question, why do we need all those layers, because you can demonstrate that there are theorems with which you can approximate any function you want, as close as you want, with two layers. There is no real true theoretical formula given for it, but there is a lot of intuitive and empirical evidence. The reason you need layers is that the natural world is compositional – there is a natural hierarchy of abstractions that describe this world,” described Prof LeCun.
Why is it this way? We don’t know. Prof LeCun thinks that it corresponds to the famous quote by Albert Einstein – the most incomprehensible thing about the world is that it is comprehensible. The reason that the world is understandable is because of its compositionality. Take the example of the Universe, which has at its bottom level, the elementary particles that form atoms, which in turn make molecules, materials, objects, and so on. It is the same for the perceptual world – if you want to recognize images, there are pixels. These pixels assemble to form oriented edges and contours, which in turn, form motifs, which join to form parts of objects, and eventually, in the end, you have a scene. If you have more layers, you get more powerful representations of better functions.
Deep Learning is building a system by assembling parameterized modules into a (possibly dynamic) computation graph and training it to perform a task by optimizing the parameters using a gradient-based method. A lot of people talk about the limitations of this area, but most of those critiques are really the limitations of Supervised Learning and not Deep Learning per se.
Drawing Inspiration From Biology
If you want to recognize an image, it is very high-dimensional input – a couple of 100 multiplied by another couple 100. So, how are we going to build multi-layer neural networks?
According to Prof LeCun, “That’s when inspiration from biology comes in. In the 60s, there was a classic and Nobel-Prize winning work by Hubel and Wiesel in neuroscience, where they discovered that neurons in the visual cortex look at the localized receptive fields. So, these local feature detectors get activated by particular patterns in particular areas. These neurons are present across the entire visual field. It suggests a type of computation in the vision-model that we could reproduce. The second discovery by Hubel and Wiesel is the idea of complex cells. It is a neuron that integrates the activations of several detector cells in the previous layer and, basically, pools their answer. The purpose of this is to build a little bit of shifting variance in their representation.”
Kunihiko Fukushima (renowned Japanese computer scientist) had the idea of building a computer model of this architecture back in the late 70s-80s that he called the neural Cognitron. Since he did not have the back-propagation algorithm to train it, he used all kinds of unsupervised algorithms. A few years later, Prof LeCun came up with the idea of utilizing very similar architectures and training them with back-propagation, and that’s what a Convolutional Network is. After handwritten digit and character recognition, he soon realized that such systems could also identify a group of such digits, without having to separate them (explicit segmentation) from each other in advance. It is, now, finding a lot of applications in all possible areas, viz., pedestrian detection, robots, and self-driving cars, to name a few. Readers interested in a visual exploration of convolutional networks would likely enjoy this exposition by Grant Anderson.
The Deep Learning Revolution
The lecture proceeded to inform the audience about breakthrough research by the scientists, headed by Geoffrey Everest Hinton at the University of Toronto, who came up with a very efficient implementation of convolution networks on GPUs. They were able to run a very large convolutional net and train it on the ImageNet (image dataset), which has 1.3 million samples. It was clear that those networks strive when the dataset is enormous, and that’s where other machine learning methods start crumbling. It led to a considerable improvement in the performance of the vision system on ImageNet from 26% to 16% error and created a revolution.
There is a lot of interest in industries for these networks owing to their numerous uses. Tech giants, such as Google, Microsoft, Facebook, and others, continuously try to find architectures that are both efficient and compact in terms of memory footprints and also have high performance on datasets like ImageNet.
“The nice thing about the trends in AI is that a lot of research takes place in the open. So, at Facebook, we tend to opensource pretty much everything we do. The lastest pre-trained image recognition system here is called Detectron2, and it integrates a lot of different methods. You can download and re-train it,” said Prof LeCun.
Convolution Networks also find their use in human welfare through medical image analysis. Some scientists at NYU utilized these 3D nets to segment the femur from MRI images. MRI images are 3D images, and a person has to flip through the slices to view it entirely. But the net looks at the entire volume altogether, works better, and helps doctors who want to do hip-surgeries. Similarly, people have applied it to detect breast cancer in mammograms (set of 2D images with reasonable accuracy). NYU and Facebook have also collaborated to work on accelerating the process of gathering data from MRI machines, so instead of lying down in a noisy MRI machine for half an hour, the examination can take only five minutes or so. It uses Deep Learning to do image restoration.
Finding its applications in many fields such as Physics, Chemistry, and Environmental Sciences, in the long list of many others, ConvuNets are gaining a lot of popularity. Experimental Physicists are now utilizing them as phenomenological models of the things they are discerning. Say, the statistical properties of things that you observe in space are often difficult to model from first principles. Still, you can train a neural net to make several predictions like accelerating the solving of partial differential equations, effective climate modeling, and a lot of work in astrophysics and high-energy physics.
Deep Learning Saves Lives
Deep Learning is an innovation that has revolutionized computer perception and control. If done better, it works well and saves lives. Examples include deploying automated emergency braking systems, medical image analysis, cosmological structure formation prediction, etc. Likewise, it is critical to have sound systems that detect and filter (suppress/downgrade) online content like hate speech, weapon-sales, calls for violence, and so on.
Prof LeCun also brought up the topic of Reinforcement Learning, a field that gained sudden popularity in the last few years. It is the idea where you do not tell the machine the correct answer, but inform it whether what it did was good or bad (after completing the trial) using an appropriate incentive scheme – positive “rewards” for a good move and negative feedback for a bad one. This approach has proven very successful for games (for example, it’s a popular framework for Robocup teams) because there you can run it millions of times. It works well in simulation but doesn’t work as well in the real world since it requires an impractical number of trials.
“That’s a question we need to find an answer to – how to make it work in the real world – so that we can make real progress in AI. Current Deep Learning methods still cannot give us machines with common sense, intelligent personal assistants, smart chatbots, household robots, agile and dexterous robots and Artificial General Intelligence (AGI),” he said.
The Three Challenges
The community needs to work on three problems. The first is learning with fewer trials. The concept of Self-Supervised Learning can solve it. It is learning to represent the world, not by training yourself to do a test but by observation. The second problem is learning to reason – what if we need the machines to reflect on a particular course of action before taking it? We need to implement reasoning in Deep Learning in ways that are compatible with gradient-based learning. The third one is learning to plan complex action sequences, and this requires some more consideration.
The Magic Of Self-Supervised Learning
The reason behind animals and us learning things so quickly is our observation. We learn the models of the world by prediction, which is the essence of intelligence. A possible way to approach this is by Self Supervised Learning – predicting everything from everything else. It signifies training a machine to predict the missing information by filling in the blanks. Why is it useful? Because, if a device is doing a nice job at predicting what’s going to happen in the future, it probably has a pretty good grasp of the rules of the world.
In the words of Prof LeCun, “Self Supervised learning in the past one and a half years has been unbelievably successful in the context of natural language processing. It gave people the idea of using this learning for recognizing features of images and videos as well. But, in this case, it provides blurry predictions because the system cannot exactly predict what is going to happen next. The solution is the latent variable Energy-Based Models that allow systems to make multiple predictions. All this led me to the idea that Self Supervised learning is very vital because at each training sample you show the machine, you are asking it to predict a lot. So, you are giving it a massive amount of information, and it can learn a lot about how the world works just by trying to make these predictions. You get a lot more information from a trial by using this form of learning than you get in Supervised Learning and Reinforcement Learning. The only problem is that it can be unreliable because there is uncertainty in the world and we need to solve this issue.”
So, the next revolution in Artificial Intelligence will not be supervised or reinforced, but will probably be Self-Supervised!
With his proposal for energy-based models, the take-home message was that self-supervised learning is the future of AI, with some form of unsupervised learning. It will allow us to train massive networks because data will be very cheap, high-dimensional, and informative, and help us to learn hierarchical features for low-resource tasks. It will also allow us to determine the Forward Models of the world and come up with some very efficient control systems.